Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records

نویسندگان

  • Yuan Luo
  • Peter Szolovits
چکیده

In natural language processing, stand-off annotation uses the starting and ending positions of an annotation to anchor it to the text and stores the annotation content separately from the text. We address the fundamental problem of efficiently storing stand-off annotations when applying natural language processing on narrative clinical notes in electronic medical records (EMRs) and efficiently retrieving such annotations that satisfy position constraints. Efficient storage and retrieval of stand-off annotations can facilitate tasks such as mapping unstructured text to electronic medical record ontologies. We first formulate this problem into the interval query problem, for which optimal query/update time is in general logarithm. We next perform a tight time complexity analysis on the basic interval tree query algorithm and show its nonoptimality when being applied to a collection of 13 query types from Allen's interval algebra. We then study two closely related state-of-the-art interval query algorithms, proposed query reformulations, and augmentations to the second algorithm. Our proposed algorithm achieves logarithmic time stabbing-max query time complexity and solves the stabbing-interval query tasks on all of Allen's relations in logarithmic time, attaining the theoretic lower bound. Updating time is kept logarithmic and the space requirement is kept linear at the same time. We also discuss interval management in external memory models and higher dimensions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Genetics Identification of Genomic Predictors of Atrioventricular Conduction Using Electronic Medical Records as a Tool for Genome Science

Background—Recent genome-wide association studies in which selected community populations are used have identified genomic signals in SCN10A influencing PR duration. The extent to which this can be demonstrated in cohorts derived from electronic medical records is unknown. Methods and Results—We performed a genome-wide association study on 2334 European American patients with normal ECGs withou...

متن کامل

Annotating progressive aspect constructions in the spoken section of the British National Corpus

We present a set of stand-off annotations for the ninety thousand sentences in the spoken section of the British National Corpus (BNC) which feature a progressive aspect verb group. These annotations may be matched to the original BNC text using the supplied document and sentence identifiers. The annotated features mostly relate to linguistic form: subject type, subject person and number, form ...

متن کامل

Anaphoric relations in the clinical narrative: corpus creation

OBJECTIVE The long-term goal of this work is the automated discovery of anaphoric relations from the clinical narrative. The creation of a gold standard set from a cross-institutional corpus of clinical notes and high-level characteristics of that gold standard are described. METHODS A standard methodology for annotation guideline development, gold standard annotations, and inter-annotator ag...

متن کامل

Extracting Concepts Related to a Homelessness from the Free Text of VA Electronic Medical Records

Mining the free text of electronic medical records (EMR) using natural language processing (NLP) is an effective method of extracting information not always captured in administrative data. We sought to determine if concepts related to homelessness, a non-medical condition, were amenable to extraction from the EMR of Veterans Affairs (VA) medical records. As there were no off-the-shelf products...

متن کامل

Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications

We aim to build and evaluate an open-source natural language processing system for information extraction from electronic medical record clinical free-text. We describe and evaluate our system, the clinical Text Analysis and Knowledge Extraction System (cTAKES), released open-source at http://www.ohnlp.org. The cTAKES builds on existing open-source technologies-the Unstructured Information Mana...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2016